fix: use name-mapped field IDs for projection and predicate pushdown by viirya · Pull Request #2612 · apache/iceberg-rust

viirya · 2026-06-10T01:20:30Z

Which issue does this PR close?

Closes BUG: Parquet name mapping still uses position fallback for projection and predicate pushdown #2403.

What changes are included in this PR?

When a Parquet file lacks embedded field IDs but a name mapping (schema.name-mapping.default) is available, the reader applies the mapping to the Arrow schema — but it still planned column projection and predicate pushdown with the position-based fallback (field id N → column N-1). As a result:

Projected columns whose mapped field IDs don't line up with their physical positions were silently read as all-NULL.
Predicates were evaluated against the wrong physical columns, silently returning wrong rows (this also makes row-group pruning and row selection prune incorrectly).

This contradicts Java's ReadConf, which uses a three-branch strategy: embedded IDs → field-id projection; name mapping → apply mapping, then field-id projection; neither → position fallback. Our code applied the name mapping to the Arrow schema but then took the fallback branch anyway.

Changes:

pipeline.rs: compute use_position_fallback = missing_field_ids && task.name_mapping.is_none() and pass it (instead of missing_field_ids) to both get_arrow_projection_mask and build_field_id_set_and_map, so position fallback only applies when there are no embedded field IDs and no name mapping.
projection.rs: build_field_id_set_and_map now distinguishes the name-mapping case. When the Parquet descriptor has no embedded field IDs but a name mapping assigned IDs to the Arrow schema, it builds the field-id → leaf-column map from the Arrow schema's PARQUET:field_id metadata (new build_field_id_map_from_arrow_schema). Arrow leaf ordering matches Parquet leaf column ordering (both depth-first), the same invariant get_arrow_projection_mask already relies on for ProjectionMask::leaves.

Are these changes tested?

Two new regression tests in projection.rs, covering both broken paths:

test_read_parquet_with_name_mapping_uses_mapped_field_ids: a file without field IDs whose columns [name, subdept] map to non-contiguous field IDs (2, 4). On main, projection NULL-fills both columns; with this fix, the values are read correctly.
test_predicate_on_name_mapped_file_uses_mapped_field_ids: a predicate name = "Alice" on the same file shape, with row-group filtering and row selection enabled. On main, the predicate is evaluated against the wrong physical column and returns 2 rows; with this fix, it returns the single correct row.

Both tests fail on main (verified by reverting the pipeline.rs strategy line) and pass with the fix. The full arrow:: unit-test suite (97 tests) passes.

When a Parquet file lacks embedded field IDs but a name mapping (schema.name-mapping.default) is available, the reader applied the mapping to the Arrow schema yet still planned column projection and predicate pushdown with the position-based fallback (field id N -> column N-1). Columns whose mapped field IDs do not line up with their physical positions were read as NULL, and predicates were evaluated against the wrong physical columns, silently returning wrong rows. Position fallback now applies only when the file has no embedded field IDs AND no name mapping is available, matching Java's three-branch ReadConf strategy. With a name mapping, projection and predicate planning resolve columns through the field IDs the mapping assigned to the Arrow schema. Closes apache#2403

viirya mentioned this pull request Jun 10, 2026

Extract name mapping from table properties #2519

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use name-mapped field IDs for projection and predicate pushdown#2612

fix: use name-mapped field IDs for projection and predicate pushdown#2612
viirya wants to merge 1 commit into
apache:mainfrom
viirya:fix/2403-name-mapping-projection

viirya commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

viirya commented Jun 10, 2026

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant